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THERMAL TOLERANT AVICELASE FROM ACIDOTHERMUS CELL ULOLYTICUS 

Government Interests 

The United States Government has rights in this invention under Contract No. 
DE-AC36-99GO10337 between the United States Department of Energy and the National 
Renewable Energy Laboratory, a Division of the Midwest Research Institute. 

Field of the Invention 

The invention generally relates to a novel avicelase from Acidothermus cellulolyticus, Aviin. 
More specifically, the invention relates to purified and isolated Avim polypeptides, nucleic acid 
molecules encoding the polypeptides, and processes for production and use of AviUI, as well as 
variants and derivatives thereof. 

Background of the Invention 

Plant biomass as a source of energy production can include agricultural and forestry products, 
associated by-products and waste, municipal solid waste, and industrial waste. In addition, over 
50 million acres in the United States are currently available for biomass production, and there are 
a number of terrestrial and aquatic crops grown solely as a source for biomass (A Wiselogel, et al. 
Biomass feedstocks resources and composition. In CE Wyman, ed. Handbook on Bioethanol: 
Production and Utilization. Washington, DC: Taylor & Francis, 1996, pp 105-118). Biofuels 
produced from biomass include ethanol, methanol, biodiesel, and additives for reformulated 
gasoline. Bioftiels are desirable because they add little, if any, net carbon dioxide to the 
atmosphere and because they greatly reduce ozone formation and carbon monoxide emissions as 
compared to the environmental output of conventional fuels. (P Bergeron. Environmental 
impacts of bioethanol. In CE Wyman, ed. Handbook on Bioethanol: Production and Utilization. 
Washington, DC: Taylor & Francis, 1996, pp 90-103). 

Plant biomass is the most abundant source of carbohydrate in the world due to the lignocellulosic 
materials composing the cell walls of all higher plants. Plant cell walls are divided into two 
sections, the primary and the secondary cell walls. The primary cell wall, which provides 
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structure for expanding cells (and hence changes as the cell grows), is composed of three major 
polysaccharides and one group of glycoproteins. The predominant polysaccharide, and most 
abundant source of carbohydrates, is cellulose, while hemicellulose and pectin are also found in 
abundance. Cellulose is a linear beta-(l,4)-D-glucan and comprises 20% to 30% of the primary 
5 cell wall by weight. The secondary cell wall, which is produced after the cell has completed 
growing, also contains polysaccharides and is strengthened through polymeric hgnin covalently 
cross-linked to hemicellulose. 

Carbohydrates, and cellulose in particular can be converted to sugars by well-known methods 
10 including acid and enzymatic hydrolysis. Enzymatic hydrolysis of cellulose requires the processing 
of biomass to reduce size and facilitate subsequent handling. Mild acid treatment is then used to 
C3 hydrolyze part or all of the hemicellulose content of the feedstock. Finally, cellulose is converted 
Jo to ethanol through the concerted action of celhilases and saccharolytic fermentation 
t1 (simuhaneous saccharification fermentation (SSF)). The SSF process, using the yeast 
15 Saccharomyces cerevisiae for example, is often incomplete, as it does not utilize the entire sugar 
p content of the plant biomass, namely the hemicelhilose fi-action. 

The cost of producing ethanol from biomass can be divided into three areas of expenditure: 
|0 pretreatment costs, fermentation costs, and other costs. Pretreatment costs include biomass 
il 20 milling, pretreatment reagents, equipment maintenance, power and water, and waste 
neutrahzation and disposal. The fermentation costs can include enzymes, nutrient supplements, 
yeast, maintenance and scale-up, and waste disposal. Other costs include biomass purchase, 
transportation and storage, plant labor, plant utilities, ethanol distillation, and administration 
(which may include technology-use hcenses). One of the major expenses incurred in SSF is the 
25 cost of the enzymes, as about one kilogram of cellulase is required to fully digest 50 kilograms of 
cellulose. Economical production of cellulase is also compounded by factors such as the 
relatively slow gowth rates of cellulase-producing organisms, levels of cellulase expression, and 
the tendency of enzyme-dependent processes to partially or completely inactivate enzymes due to 
conditions such as elevated temperature, acidity, proteolytic degradation, and solvent 
30 degradation. 
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Enzymatic degradation of cellulose requires the coordinate action of at least three different types 
of cellulases. Such enzymes are given an Enzyme Commission (EC) designation according to the 
Nomenclature Committee of the International Union of Biochemistry and Molecular Biology 
(Eur. J. Biochem. 264: 607-609 and 610-650, 1999). Endo- beta-(l,4)-glucanases (EC 3.2.1.4) 

5 cleave the cellulose strand randomly along its length, thus generating new chain ends. Exo- beta- 
(l,4)-glucanases (EC 3.2.1.91) are processive enzymes and cleave cellobiosyl units (beta-(l,4)- 
glucose dimers) from free ends of cellulose strands. Lastly, beta-D-glucosidases (cellobiases: EC 
3.2.1.21) hydrolyze cellobiose to glucose. All three of these general activities are required for 
efficient and complete hydrolysis of a polymer such as cellulose to a subunit, such as the simple 

10 sugar, glucose. 

O Highly thermostable enzymes have been isolated from the cellulolytic thermophile Acidothermus 
3 cellulolyticus gen. nov., sp. nov., a bacterium originally isolated fi-om decaying wood in an acidic, 
jl thermal pool at Yellowstone National Park. A. Mohagheghi et al., (1986) Lit. J. Systematic 
W 15 Bacteriologv , 36(3): 435-443. One celhilase enzyme produced by this organism, the endoghicanase 
m EI, is known to display maximal activity at 75 ""C to 83''C. M.P. Tucker et al. (1989), 
L Bio/Technology , 7(8): 817-820. El endoglucanase has been described in U.S. Patent 5,275,944. 
'"^ The A. cellulolyticus El endoglucanase is an active cellulase; in combination with the 
Ci exocellulase CBH I from Trichoderma reeseU El gives a high level of saccharification and 
il 20 contributes to a degree of synergism. Baker JO et al. (1994), Appl. Biochem. BiotechnoL , 45/46: 
245-256. The gene coding EI catalytic and carbohydrate binding domains and linker peptide were 
described in U.S. Patent 5,536,655. El has also been expressed as a stable, active enzyme from a 
wide variety of hosts, including E, colU Streptomyces lividans, Pichia pastoris, cotton, tobacco, 
and Arabidopsis (Dai Hooker BS, Anderson DB, Thomas SR. Transgenic Res. 2000 Feb; 
25 9(l):43-54). 

The potential exists for the successfiil, commercial-scale expression of heterologous cellulases, 
and in particular novel cellulases with or without any one or more desirable properties such as 
thermal tolerance and resistance to acid inactivation, proteolytic inactivation, and solvent 
30 inactivation. Such expression can occur in filamentous fimgi, bacteria, and other hosts. 
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There is a need within the art to generate alternative cellulase enzymes capable of assisting in the 
commercial-scale processing of cellulose to sugar for use in biofiiel production. Against this 
backdrop the present invention has been developed. The potential exists for the successful, 
commercial-scale expression of heterologous cellulase polypeptides, and in particular novel 
5 cellulase polypeptides with or without any one or more desirable properties such as thermal 
tolerance, and partial or complete resistance to extreme pH inactivation, proteolytic inactivation, 
solvent inactivation, chaotropic agent inactivation, oxidizing agent inactivation, and detergent 
inactivation. Such expression can occur in fungi, bacteria, and other hosts. 

10 Summary of the Invention 

The present invention provides Avim, a novel member of the glycoside hydrolase (GH) family of 
enzymes, and in particular a thermal tolerant glycoside hydrolase useful in the degradation of 
cellulose. AviTTT polypeptides of the invention include those having an amino acid sequence 
shown in SEQ ID N0:1, as well as polypeptides having substantial amino acid sequence identity 
15 to the amino acid sequence of SEQ ID N0:1 and useful fragments thereof, including, a catalytic 
domain having significant sequence similarity to the GH74 family, a first carbohydrate binding 
domain (type II) and a second carbohydrate binding domain (type HI). See FIG 1 . 

The invention also provides a polynucleotide molecule encoding AvilE polypeptides and 
20 fragments of Avim polypeptides, for example catalytic and carbohydrate binding domains. 
Polynucleotide molecules of the invention include those molecules having a nucleic acid 
sequence as shown in SEQ ID N0:2; those that hybridize to the nucleic acid sequence of SEQ ID 
N0:2 under high stringency conditions; and those having substantial nucleic acid identity with 
the nucleic acid sequence of SEQ ID NO:2. 

25 

The invention includes variants and derivatives of the Avilll polypeptides, including fusion 
proteins. For example, fusion proteins of the invention include Avilll polypeptide fused to a 
heterologous protein or peptide that confers a desired function. The heterologous protein or 
peptide can facilitate purification, ohgomerization, stabiUzation, or secretion of the Avim 
30 polypeptide, for example. As further examples, the heterologous polypeptide can provide 
enhanced activity, including catalytic or binding activity, for Avim polypeptides, where the 
enhancement is either additive or synergistic. A fusion protein of an embodiment of the 
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invention can be produced, for example, from an expression construct containing a 
polynucleotide molecule encoding Aviin polypeptide in frame with a polynucleotide molecule 
for the heterologous protein. Embodiments of the invention also comprise vectors, plasmids, 
expression systems, host cells, and the like, containing a Avilll polynucleotide molecule. Genetic 
engineering methods for the production of Aviin polypeptides of embodiments of the invention 
include expression of a polynucleotide molecule in cell free expression systems and in cellular 
hosts, according to known methods. 

The invention fiirther includes compositions containing a substantially purified Avilll 
polypeptide of the invention and a carrier. Such compositions are administered to a biomass 
containing cellulose for the reduction or degradation of the cellulose. 

The invention also provides reagents, compositions, and methods that are useful for analysis of 
Aviin activity. 

These and various other features as well as advantages which characterize the present invention 
will be apparent from a reading of the following detailed description and a review of the 
associated drawings. 

The following Tables 4 and 5 includes sequences used in describing embodiments of the present 
invention. In Table 4, the abbreviations are as follows: CD, catalytic domain; CBD_II, 
carbohydrate binding domain type E; CBD_in, carbohydrate binding domain type IE; and FN-m, 
fibronectin domain type m. When used herein, N* indicates a string of unknown nucleic acid 
units, and X* indicates a string of unknown amino acid units, for example about 50 or more. 
Table 4 includes approximate start and stop information for segments, and Table 5 includes 
amino acid sequence data for segments. 
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Brief Description of the Drawings 

FIG. 1 is a schematic representation of the gene sequence and amino acid segment organization. 
FIG 2 is a graphic representation of the glycoside hydrolase gene/protein famihes found in 
5 various organisms. 

Detailed Description 

Definitions: 

The following definitions are provided to facilitate understanding of certain terms used frequently 
10 herein and are not meant to limit the scope of the present disclosure: 

U "Amino acid" refers to any of the twenty naturally occuring amino acids as well as any modified 
Jo amino acid sequences. Modifications may include natural processes such as posttranslational 

processing, or may include chemical modifications which are known in the art. Modifications 
W 15 include but are not limited to: phosphorylation, ubiquitination, acetylation, amidation, 
fri glycosylation, covalent attachment of flavin, ADP-ribosylation, cross linking, iodination, 

methylation, and alike. 

?• - 

CO "Antibody" refers to a Y-shaped molecule having a pair of antigen binding sites, a hinge region 
tl 20 and a constant region. Fragments of antibodies, for example an antigen binding fragment (Fab), 
chimeric antibodies, antibodies having a human constant region coupled to a murine antigen 
binding region, and fragments thereof, as well as other well known recombinant antibodies are 
included in the present invention. 

25 "Antisense" refers to polynucleotide sequences that are complementary to target "sense" 
polynucleotide sequence. 

"Binding activity" refers to any activity that can be assayed by characterizing the ability of a 
polypeptide to bind to a substrate. The substrate can be a polymer such as cellulose or can be a 
30 complex molecule or aggregate of molecules where the entire moiety comprises at least some 
cellulose. 
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"Cellulase activity" refers to any activity that can be assayed by characterizing the enzymatic 
activity of a cellulase. For example, cellulase activity can be assayed by determining how much 
reducing sugar is produced during a fixed amount of time for a set amount of enzyme (see Irwin 
et al., (1998) J, Bacteriology^ 1709-1714). Other assays are well known in the art and can be 
5 substituted. 



"Complementary" or "complementarity" refers to the ability of a polynucleotide in a 
polynucleotide molecule to form a base pair with another polynucleotide in a second 
polynucleotide molecule. For example, the sequence A-G-T is complementary to the sequence T- 
10 C-A. Complementarity may be partial, in which only some of the polynucleotides match 
according to base pairing, or complete, where all the polynucleotides match according to base 
[1 pairing. 

■^^ "Expression" refers to transcription and translation occurring within a host cell. The level of 
W 15 expression of a DNA molecule in a host cell may be determined on the basis of either the amount 
m of corresponding mRNA that is present within the cell or the amount of DNA molecule encoded 
L:^ protein produced by the host cell (Sambrook et al, 1989, Molecular cloning: A Laboratory 
y Manual, 18.1-18.88). 
;n 

20 "Fusion protein" refers to a first protein having attached a second, heterologous protein. 
Preferably, the heterologous protein is fused via recombinant DNA techniques, such that the first 
and second proteins are expressed in frame. The heterologous protein can confer a desired 
characteristic to the fiision protein, for example, a detection signal, enhanced stability or 
stabilization of the protein, facilitated oligomerization of the protein, or facilitated purification of 
25 the fiision protein. Examples of heterologous proteins usefiil in the fiision proteins of the 
invention include molecules having one or more catalytic domains of Avilll, one or more binding 
domains of AvilU, one or more catalytic domains of a glycoside hydrolase other than Avilll, one 
or more binding domains of a glycoside hydrolase other than Avilll, or any combination thereof 
Further examples include immunoglobulin molecules and portions thereof, peptide tags such as 
30 histidine tag (6-His), leucine zipper, substrate targeting moieties, signal peptides, and the like. 
Fusion proteins are also meant to encompass variants and derivatives of AviUI polypeptides that 
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are generated by conventional site-directed mutagenesis and more modem techniques such as 
directed evolution, discussed infra. 

"Genetically engineered" refers to any recombinant DNA or RNA method used to create a 
5 prokaryotic or eukaryotic host cell that expresses a protein at elevated levels^ at lowered levels, or 
in a mutated form. In other words, the host cell has been transfected, transformed, or transduced 
with a recombinant polynucleotide molecule, and thereby been altered so as to cause the cell to 
alter expression of the desired protein. Methods and vectors for genetically engineering host cells 
are well known in the art; for example various techniques are illustrated in Current Protocols in 
10 Molecular Biology, Ausubel et al, eds. (Wiley & Sons, New York, 1988, and quarterly updates). 
Genetically engineering techniques include but are not limited to expression vectors, targeted 

□ homologous recombination and gene activation (see, for example, U.S. Patent No. 5,272,071 to 
Chappel) and trans activation by engineered transcription factors (see, for example, Segal et al., 

t'^; 1999, Proc Natl Acad Sci USA 96(6):2758-63). 

W 15 

m "Glycoside hydrolase family" refers to a family of enzymes which hydrolyze the glycosidic bond 
between two or more carbohydrates or between a carbohydrate and a non-carbohydrate moiety 

IJ 

y (Henrissat B., (1991) Biochem. J., 280:309-316). Identification of a putative glycoside hydrolase 
P3 family member is made based on an amino acid sequence comparison and the finding of 

20 significant sequence similarity within the putative member's catalytic domain, as compared to the 
catalytic domains of known family members. 

"Homology" refers to a degree of complementarity between polynucleotides, having significant 
effect on the efficiency and strength of hybridization between polynucleotide molecules. The 
25 term also can refer to a degree of similarity between polypeptides. 

"Host cell" or "host cells" refers to cells expressing a heterologous polynucleotide molecule. 
Host cells of the present invention express polynucleotides encoding Aviin or a fragment thereof 
Examples of suitable host cells usefiil in the present invention include, but are not limited to, 
30 prokaryotic and eukaryotic cells. Specific examples of such cells include bacteria of the genera 
Escherichia, Bacillus, and Salmonella, as well as members of the genera Pseudomonas, 
Streptomyces, and Staphylococcus', fimgi, particularly filamentous fimgi such as Trichoderma and 
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Aspergillus, Phanerochaete chrysosporium and other white rot fungi; also other fungi including 
Fusaria, molds, and yeast including Saccharomyces sp., Pichia sp., and Candida sp. and the like; 
plants e.g. Arabidopsis, cotton, barley, tobacco, potato, and aquatic plants and the like; SF9 insect 
cells (Summers and Smith, 1987, Texas Agriculture Experiment Station Bulletin, 1555), and the 
5 like. Other specific examples include mammalian cells such as human embyonic kidney cells 
(293 cells), Chinese hamster ovary (CHO) cells (Puck et al, 1958, Proc. Natl Acad. Set USA 60, 
1275-1281), human cervical carcinoma cells (HELA) (ATCC CCL 2), human liver cells (Hep 
G2) (ATCC HB8065), human breast cancer cells (MCF-7) (ATCC HTB22), human colon 
carcinoma cells (DLD-1) (ATCC CCL 221), Daudi cells (ATCC CRL-213), murine myeloma 
10 cells such as P3/NSI/l.Ag4-l (ATCC TIB-18), P3X63Ag8 (ATCC TIB-9), SP2/0-Agl4 (ATCC 
CRL-1581)andthe like. 

£3 

.'S "Hybridization" refers to the pairing of complementary polynucleotides during an annealing 
period. The strength of hybridization between two polynucleotide molecules is impacted by the 
ly 15 homology between the two molecules, stringency of the conditions involved, the melting 
111; temperature of the formed hybrid and the G:C ratio within the polynucleotides, 

p 

y "Identity" refers to a comparison between pairs of nucleic acid or amino acid molecules. 
m Methods for determining sequence identity are known. See, for example, compxiter programs 
?t 20 commonly employed for this purpose, such as the Gap program (Wisconsin Sequence Analysis 
Package, Version 8 for Unix, Genetics Computer Group, University Research Park, Madison 
Wisconsin), that uses the algorithm of Smith and Waterman, 1981, Adv. Appl Math., 2: 482-489. 

"Isolated" refers to a polynucleotide or polypeptide that has been separated from at least one 
25 contaminant (polynucleotide or polypeptide) with which it is normally associated. For example, 
an isolated polynucleotide or polypeptide is in a context or in a form that is different from that in 
which it is found in nature. 

"Nucleic acid sequence" refers to the order or sequence of deoxyribonucleotides along a strand of 
30 deoxyribonucleic acid. The order of these deoxyribonucleotides determines the order of amino 
acids along a polypeptide chain. The deoxyribonucleotide sequence thus codes for the amino 
acid sequence. 
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"Polynucleotide" refers to a linear sequence of nucleotides. The nucleotides may be 
ribonucleotides, or deoxyribonucleotides, or a mixture of both. Examples of polynucleotides in 
the context of the present invention include single and double stranded DNA, single and double 
stranded RNA, and hybrid molecules having mixtures of single and double stranded DNA and 
RNA. The polynucleotides of the present invention may contain one or more modified 
nucleotides. 

"Protein/' "peptide," and "polypeptide" are used interchangeably to denote an amino acid polymer 
or a set of two or more interacting or bound amino acid polymers. 

"Purify," or "purified" refers to a target protein that is free from at least 5-10% of contaminating 
proteins. Purification of a protein from contaminating proteins can be accomplished using known 
techniques, including ammonium sulfate or ethanol precipitation, acid precipitation, heat 
precipitation, anion or cation exchange chromatography, phosphocellulose chromatography, 
hydrophobic interaction chromatography, affmity chromatography, hydroxylapatite 
chromatography, size-exclusion chromatography, and lectin chromatography. Various protein 
purification techniques are illustrated in Current Protocols in Molecular Biology, Ausubel et al., 
eds. (Wiley & Sons, New York, 1988, and quarterly updates). 

"Selectable marker" refers to a marker that identifies a cell as having undergone a recombinant 
DNA or RNA event. Selectable markers include, for example, genes that encode antimetabolite 
resistance such as the DHFR protein that confers resistance to methotrexate (Wigler et al, 1980, 
Proc Natl Acad Sci USA 77:3567; O'Hare et al., 1981, Proc Natl Acad Sci USA, 78:1527), the 
GPT protein that confers resistance to mycophenolic acid (MuUigan & Berg, 1981, PNAS USA, 
78:2072), the neomycin resistance marker that confers resistance to the aminoglycoside G-418 
(Calberre-Garapin et al., 1981, J Mol Biol, 150:1), the Hygro protein that confers resistance to 
hygromycin (Santerre et al., 1984, Gene 30:147), and the Zeocin™ resistance marker 
(hivitrogen). In addition, the herpes simplex virus thymidine kinase, hypoxanthine-guanine 
phosphoribosyltransferase and adenine phosphoribosyltransferase genes can be employed in tk', 
hgprf and aprf cells, respectively. 
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"Stringency" refers to the conditions (temperature, ionic strength, solvents, etc) under which 
hybridization between polynucleotides occurs, A hybridzation reaction conducted under high 
stringency conditions is one that will only occur between polynucleotide molecules that have a 
high degree of complementary base pairing (85% to 100% identity). Conditions for high 
5 stringency hybridization, for example, may include an overnight incubation at about 42°C for about 
2.5 hours in 6 X SSC/0.1% SDS, followed by washing of the filters in 1.0 X SSC at 65°C, 0.1% 
SDS. A hybridization reaction conducted under moderate stringency conditions is one that will 
occur between polynucleotide molecules that have an intermediate degree of complementary base 
pairing (50% to 84% identity). 

10 

"Substrate targeting moiety" refers to any signal on a substrate, either naturally occurring or 
C3 genetically engineered, used to target any AvilU polypeptide or firagment thereof to a substrate. 
2 Such targeting moieties include ligands that bind to a substrate structure. Examples of 

ligand/receptor pairs include carbohydrate binding domains and cellulose. Many such substrate- 
m 15 Specific hgands are known and are useful in the present invention to target a AviUI polypeptide or 
f p fi-agment thereof to a substrate. A novel example is a Avilll carbohydrate binding domain that is 

used to tether other molecules to a cellulose-containing substrate such as a fabric. 

CO "Thermal tolerant" refers to the property of withstanding partial or complete inactivation by heat 
2 20 and can also be described as thermal resistance or thermal stability. Although some variation 
exists in the literature, the following definitions can be considered typical for the optimum 
temperature range of stability and activity for enzymes: psycrophilic (below Breezing to IOC); 
mesophilic (10°C to 50°C); thermophihc (50X to 75°C); and caldophihc (75°C to above boiling 
water temperature). The stabiUty and catalytic activity of enzymes are linked characteristics, and 
25 the ways of measuring these properties vary considerably. For industrial enzymes, stabihty and 
activity are best measured under use conditions, often in the presence of substrate. Therefore, 
cellulases that must act on process streams of cellulose must be able to withstand exposure up to 
thermophilic or even caldophihc temperatures for digestion times in excess of several hours. 



30 



In encompassing a wide variety of potential applications for embodiments of the present 
invention, thermal tolerance refers to the ability to function in a temperature range of from about 
15°C to about 100°C. A preferred range is from about 30°C to about 80°C. A highly preferred 
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range is from about 50°C to about TO^'C. For example, a protein that can function at about 45°C 
is considered in the preferred range even though it may be susceptible to partial or complete 
inactivation at temperatures in a range above about 45°C and less than about 80°C. For 
polypeptides derived from organisms such as Acidothermus, the desirable property of thermal 
tolerance among is often accompanied by other desirable characteristics such as: resistance to 
extreme pH degradation, resistance to solvent degradation, resistance to proteolytic degradation, 
resistance to detergent degradation, resistance to oxidizing agent degradation, resistance to 
chaotropic agent degradation, and resistance to general degradation. Cowan DA in Danson MJ et 
al. (1992) The Archaebacteria. Biochemistry and Biotechnology at 149-159, University Press, 
Cambridge, ISBN 1855780100. Here 'resistance' is intended to include any partial or complete 
level of residual activity. When a polypeptide is described as thermal tolerant it is understood 
that any one, more than one, or none of these other desirable properties can be present. 

"Variant", as used herein, means a polynucleotide or polypeptide molecule that differs from a 
reference molecule. Variants can include nucleotide changes that result in amino acid 
substitutions, deletions, fusions, or truncations in the resulting variant polypeptide when 
compared to the reference polypeptide. 

"Vector," "extra-chromosomal vector" or "expression vector" refers to a first polynucleotide 
molecule, usually double-stranded, which may have inserted into it a second polynucleotide 
molecule, for example a foreign or heterologous polynucleotide. The heterologous 
polynucleotide molecule may or may not be naturally found in the host cell, and may be, for 
example, one or more additional copy of the heterologous polynucleotide naturally present in the 
host genome. The vector is adapted for transporting the foreign polynucleotide molecule into a 
suitable host cell. Once in the host cell, the vector may be capable of integrating into the host cell 
chromosomes. The vector may optionally contain additional elements for selecting cells 
containing the integrated polynucleotide molecule as well as elements to promote transcription of 
mRNA from transfected DNA. Examples of vectors useful in the methods of the present 
invention include, but are not limited to, plasmids, bacteriophages, cosmids, retroviruses, and 
artificial chromosomes. 
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Within the application, unless otherwise stated, the techniques utilized may be found in any of 
several well-known references, such as: Molecular Cloning: A Laboratory Manual (Sambrook et 
al. (1989) Molecular cloning: A Laboratory Manual), Gene Expression Technology (Methods in 
Enzymology, Vol. 185, edited by D. Goeddel, 1991 Academic Press, San Diego, CA), "Guide to 
Protein Purification" in Methods in Enzymology (M.P. Deutshcer, 3d., (1990) Academic Press, 
Inc.), PCR Protocols: A Guide to Methods and Applications (Innis et al. (1990) Academic Press, 
San Diego, CA), Culture of Animal Cells: A Manual of Basic Technique, 2^^ ed. (R.L Freshney 
(1987) Liss, Inc., New York, NY), and Gene Transfer and Expression Protocols, pp 109-128, ed. 
E.J. Murray, The Humana Press Inc., CUfton, N.J.). 

O-Glycoside Hydrolases: 

Glycoside hydrolases are a large and diverse family of enzymes that hydrolyse the glycosidic 
bond between two carbohydrate moieties or between a carbohydrate and a non-carbohydrate 
moiety (See FIG. 2). Glycoside hydrolase enzymes are classified into glycoside hydrolase (GH) 
famihes based on significant amino acid similarities within their catalytic domains. Enzymes 
having related catalytic domains are grouped together within a family, (Henrissat et al., (1991) 
supra, and Henrissat et al. (1996), Biochem. J. 316:695-696), where the underlying classification 
provides a direct relationship between the GH domain amino acid sequence and how a GH 
domain will fold. This information ultimately provides a common mechanism for how the 
enzyme will hydrolyse the glycosidic bond within a substrate, i.e., either by a retaining 
mechanism or inverting mechanism (Henrissat., B, (1991) supra). 

Cellulases belong to the GH family of enzymes. Cellulases are produced by a variety of bacteria 
and fungi to degrade the p-1,4 glycosidic bond of cellulose and to so produce successively 
smaller firagments of cellulose and ultimately produce glucose. At present, cellulases are found 
within are at least 11 different GH families. Three different types of cellulase enzyme activities 
have been identified within these GH families: exo-acting cellulases which cleave successive 
disaccharide units from the non-reducing ends of a cellulose chain; endo-acting cellulases which 
randomly cleave successive disaccharide units within the cellulose chain; and p-glucosidases 
which cleave successive disaccharide imits to glucose (J. W. Deacon, (1997) Modem Mycology, 
3rd Ed., ISBN: 0-632-03077-1, 97-98). 
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Many cellulases are characterized by having a multiple domain unit within their overall structure, 
a GH or catalytic domain is joined to a carbohydrate-binding domain (CBD) by a glycosylated 
linker peptide (Koivula et al., (1996) Protein Expression and Purification 8:391-400). As noted 
above, cellulases do not belong to any one family of GH domains, but rather have been identified 
5 within at least 11 different GH families to date. The CBD type domain increases the 
concentration of the enzyme on the substrate, in this case cellulose, and the linker peptide 
provides flexibility for both larger domains. 

Conversion of cellulose to glucose is an essential step in the production of ethanol or other 
10 biofuels from biomass. Cellulases are an important component of this process, where 

approximately one kilogram of cellulase can digest fifty kilograms of cellulose. Within this 
n process, thermostable cellulases have taken precedent, due to their ability to function at elevated 
^j^ temperatures and under other conditions including pH extremes, solvent presence, detergent 
f ^ presence, proteolysis, etc. (see Cowan DA (1992), supra). 

Highly thermostable cellulase enzymes are secreted by the cellulolytic themophile Acidothermus 
cellulolyticus (U.S. Patent Nos. 5,275,944 and 5,110,735). This bacterium was originally 
H isolated from decaying wood in an acidic, thermal pool at Yellowstone National Park and 
ll deposited with the American Type Culture Collection (ATCC 43068) (Mohagheghi et al, (1986) 
20 Int. J. System. Bacterial., 36:435-443). 

Recently, a thermostable cellulase. El endoglucanase, was identified and characterized from 
Acidothermus cellulolyticus (U.S. Patent No. 5,536,655). The El endoglucanase has maximal 
activity between 75 and 83°C and is active to a pH well below 5. Thermostable cellulase, and El 
25 endoglucanase, are useful in the conversion of biomass to biofuels, and in particular, are useful in 
the conversion of cellulose to glucose. Conversion of biomass to biofliel represents an extremely 
important alternative fuel source that is more environmentally friendly than conventional fuels, 
and provides a use, in some cases, for waste products. 

30 Avilll: 

As described more fully in the Examples below, AvilU, a novel thermostable cellulase, has now 
been identified and characterized. The predicted amino acid sequence of Avilll (SEQ ID N0:1) 
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has an organization characteristic of a cellulase enzyme. Avim contains a carbohydrate binding 
domain - linker domain - catalytic domain -Hnker domain- fibronectin domain - linker domain - 
carbohydrate binding domain unit. In particular, Avim includes a carbohydrate binding domain 
type m (CBDm) (amino acids from about A3 5 to about A187), a GH74 catalytic domain (amino 
acids from about N231 to about P870), and a CBDn (amino acids from about G1021 to about 
S1121). 

As discussed in more detail below (Example 2), significant amino acid similarity of AvilE to 
other cellulases identifies AviHI as a cellulase. In addition, the predicted amino acid sequence 
(SEQ ID NO: 1) indicates that a CBD type m domain is present as characterized by Tomme P. et 
al. (1995), in Enzymatic Degradation of Insoluble Polysaccharides (Saddler JN & Penner M, 
eds.), at 142-163, American Chemical Society, Washington. See also Tomme, P, & Claeyssens, 
M. (1989) FEES Lett. 243, 239-2431; Gilkes, N.R et al., (1988) J.BioLChem. 263, 10401-10407. 

Avim, as noted above, has a catalytic domain, identified as belonging to the GH74 family. The 
GH74 domain family includes a number of exoglucanases, for example, from Cellulomonas fimi, 
and exoglucanase E3 isolated from Thermobifida fusca. The GH74 members degrade substrate 
using an inverting mechanism. Being a member of the GH74 family of proteins identifies Avim 
as potentially having cellulase activity. 

Avim is also a thermostable cellulase as it is produced by the themophile Acidothermus 
cellulolyticus. As discussed, Avim polypeptides can have other desirable characteristics (see 
Cowan DA (1992), supra). Like other members of the cellulase family, and in particular 
thermostable cellulases, Avim polypeptides are useftil in the conversion of biomass to biofuels 
and biofuel additives, and in particular, biofiiels from cellulose. It is envisioned that Avim 
polypeptides could be used for other purposes, for example in detergents, pulp and paper 
processing, food and feed processing, and in textile processes. Avim polypeptides can be used 
alone or in combination with one or more other cellulases or glycoside hydrolases to perform the 
uses described herein or known within the relevant art, all of which are within the scope of the 
present disclosure. 
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Avilll Polypeptides: 

Aviin polypeptides of the invention include isolated polypeptides having an amino acid sequence 
as shown below in Example 1; Table 1 and in SEQ ID N0:1, as well as variants and derivatives, 
including fragments, having substantial identity to the amino acid sequence of SEQ ID N0:1 and 
5 that retain any of the ftmctional activities of AviHI. Avim polypeptide activity cm be determined, 
for example, by subjecting the variant, derivative, or fragment to a substrate binding assay or a 
cellulase activity assay such as those described in Irwin D et al, J. Bacteriology 180(7): 1709-1714 
(April 1998). 



10 Table 1. Avilll amino acid sequence. (SEQ ID NO: 1) 

MDRSENIRLTMRSRRLVSLLAATASFAVAAALGVLPIAITASPAHAATTQ 

PYTWSNVAIGGGGFVDGIVFNEGAPGILYVRTDIGGMYRWDAANGRWIPL 

LDWVGWNNWGYNGWS lAADPINTNKVWAAVGMYTNSWDPNDGAILRSSD 
15 QGATWQITPLPFKLGGNMPGRGMGERLAVDPNNDNILYFGAPSGKGLWRS 
U3 TDSGATWSQMTNFPDVGTYIANPTDTTGYQSDIQGWWVAFDKSSSSLGQ 

ASKTIFVGVADPNNPVFWSRDGGATWQAVPGAPTGFIPHKGVFDPVNHVL 
^^J YIATSNTGGPYDGSSGDVWKFSVTSGTWTRISPVPSTDTANDYFGYSGLT 
= J IDRQHPNTIMVATQISWWPDTIIFRSTDGGATWTRIWDWTSYPNRSLRYV 
J'^ 20 LDISAEPWLTFGVQPNPPVPSPKLGWMDEAMAIDPFNSDRiVILYGTGATLY 
^ ATNDLTKWDSGGQIHIAPMVKGLEETAVNDLISPPSGAPLISALGDLGGF 

THADVTAVPSTIFTSPVFTTGTSVDYAELNPSIIVRAGSFDPSSQPNDRH 
-g VAFSTDGGKNWFQGSEPGGVTTGGTVAASADGSRFVWAPGDPGQPWYAV 
p GFGNSWAASQGVPANAQIRSDRVNPKTFYALSNGTFYRSTDGGVTFQPVA 
%j 25 AGLPSSGAVGVMFHAVPGKEGDLWLAASSGLYHSTNGGSSWSAITGVSSA 
^{i VNVGFGKSAPGSSYPAVFWGTIGGVTGAYRSDDCGTTWVLINDDQHQYG 

NWGQAI TGDHANLRRVYIGTNGRGI VYGD I GGAP SGS PS PS VS P S AS PS L 
^^ SPSPSPSSSPSPSPSPSSSPSSSPSPSPSPSPSPSRSPSPSASPSPSSSP 
C3 SPSSSPSSSPSPTPSSSPVSGGVKVQYKNNDSAPGDNQIKPGLQWNTGS 
30 SSVDLSTVTVRYWFTRDGGSSTLVYNCDWAAIGCGNIRASFGSVNPATPT 

ADTYLQX* 



As Hsted and described in Tables 1 and 5, the isolated AviUI polypeptide includes an N-terminal 
35 hydrophobic region that fimctions as a signal peptide, having an ^ino acid sequence that begins 
with Metl and extends to about A34; a carbohydrate binding domain having sequence similarity to 
such type m domains that begins with about A35 and extends to about A187, a catalytic domain 
having significant sequence similarity to a GH74 family domain that begins with about N231 and 
extends to about P870, a fibronectin type HI domain that begins with about D901 and extends to 
40 about G985, a carbohydrate binding domain type n region that begins with about G1021 and 
extends to about SI 121. Variants and derivatives of Avim include, for example, AvilE 
polypeptides modified by covalent or aggregative conjugation with other chemical moieties, such as 
glycosyl groups, polyethylene glycol (PEG) groups, lipids, phosphate, acetyl groups, and the like. 
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The amino acid sequence of Avim polypeptides of the invention is preferably at least about 60% 
identical, more preferably at least about 70% identical, or in some embodiments at least about 90% 
identical, to the AvilE amino acid sequence shown above in Table 1 and SEQ ID N0:1. The 
percentage identity, also termed homology (see definition above) can be readily determined, for 
example, by comparing the two polypeptide sequences using any of the computer programs 
commonly employed for this purpose, such as the Gap program (Wisconsin Sequence Analysis 
Package, Version 8 for Unix, Genetics Computer Group, University Research Park, Madison 
Wisconsin), which uses the algorithm of Smith and Waterman, 1981, Adv. Appl. Math. 2: 482-489. 

Variants and derivatives of the Avim polypeptide may further include, for example, fusion 
proteins formed of a Aviin polypeptide and a heterologous polypeptide. Preferred heterologous 
polypeptides include those that facilitate purification, ohgomerization, stability, or secretion of 
the Aviin polypeptides. 

Avim polypeptide variants and derivatives, as used in the description of the invention, can contain 
conservatively substituted amino acids, meaning that one or more amino acid can be replaced by an 
amino acid tiiat does not alter the secondary and/or tertiary stincture of the polypeptide. Such 
substitutions can include the replacement of an amino acid, by a residue having similar 
physicochemical properties, such as substituting one aliphatic residue (He, Val, Leu, or Ala) for 
another, or substitutions between basic residues Lys and Arg, acidic residues Glu and Asp, amide 
residues Gin and Asn, hydroxyl residues Ser and Tyr, or aromatic residues Phe and Tyr. 
Phenotypically silent amino acid exchanges are described more fully in Bowie et al, 1990, Science 
2^7:1306-1310. In addition, fimctional Avim polypeptide variants include those having amino acid 
substitiitions, deletions, or additions to the amino acid sequence outside functional regions of the 
protein, for example, outside the catalytic and carbohydrate binding domains. These would include, 
for example, the various linker sequences that connect fimctional domains as defmed herein. 

The Avim polypeptides of tiie present invention are preferably provided in an isolated form, and 
preferably are substantially purified. The polypeptides may be recovered and purified from 
recombinant cell cultures by known methods, including, for example, ammonium sulfate or ethanol 
precipitation, anion or cation exchange chromatography, phosphocellulose chromatography. 
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hydrophobic interaction chromatography, affinity chromatography, hydroxylapatite 
chromatography, and lectin chromatography. Preferably, high performance liquid chromatography 
(HPLC) is employed for purification. 

5 Another preferred form of Avim polypeptides is that of recombinant polypeptides as expressed by 
suitable hosts. Furthermore, the hosts can simultaneously produce other cellulases such that a 
mixture is produced comprising a AvilU polypeptide and one or more other cellulases. Such a 
mixture can be effective in crude fermentation processing or other industrial processing. 

10 Avim polypeptides can be fiised to heterologous polypeptides to facilitate purification. Many 
available heterologous peptides (peptide tags) allow selective binding of the fusion protein to a 
binding partner. Non-limiting examples of peptide tags include 6-His, thioredoxin, hemaglutinin, 
GST, and the OmpA signal sequence tag. A binding partner that recognizes and binds to the 
heterologous peptide can be any molecule or compound, including metal ions (for example, metal 

15 affinity columns), antibodies, antibody fi-agments, or any protein or peptide that preferentially binds 
the heterologous peptide to permit purification of the fiision protein. 

Avim polypeptides can be modified to facilitate formation of Avim oligomers. For example, Av im 
polypeptides can be fused to peptide moieties that promote oligomerization, such as leucine zippers 

20 and certain antibody fi-agment polypeptides, for example, Fc polypeptides. Techniques for 
preparing these fiision proteins are known, and are described, for example, in WO 99/31241 and in 
Cosman etal., 2001 Immunity 14:123-133. Fusion to an Fc polypeptide offers the additional 
advantage of facilitating purification by affinity chromatography over Protein A or Protein G 
columns. Fusion to a leucine-zipper (LZ), for example, a repetitive heptad repeat, ofl;en witii four or 

25 five leucine residues interspersed with other amino acids, is described in Landschultz et al, 1988, 
Science, 240:1759. 

It is also envisioned that an expanded set of variants and derivatives of Avim polynucleotides 
and/or polypeptides can be generated to select for useful molecules, where such expansion is 
30 achieved not only by conventional methods such as site-directed mutagenesis (SDM) but also by 
more modem techniques, either independently or in combination. 
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Site-directed-mutagenesis is considered an informational approach to protein engineering and can 
rely on high-resolution crystallographic structures of target proteins and some stratagem for specific 
amino acid changes (Van Den Burg, B.; Vriend, G.; Veltman, O.R.; Venema, G.; Eijsink, V.G.H. 
Proc. Nat. Acad. Sci. U.S. 1998, 95, 2056-2060). For example, modification of the amino acid 
sequence of Avim polypeptides can be accomplished as is known in the art, such as by mtroducing 
mutations at particular locations by oHgonucleotide-directed mutagenesis (Walder et al.,1986, 
Gene, 42:133; Bauer et al., 1985, Gene 37:73; Craik, 1985, BioTechniques, 12-19; Smith et al., 
1981, Genetic Engineering: Principles and Methods, Plenum Press; and U.S. Patent No. 4,518,584 
and U.S. Patent No. 4,737,462). SDM technology can also employ the recent advent of 
computational methods for identifying site-specific changes for a variety of protein engineering 
objectives (Hellinga, H.W. Nature Structural. Biol. 1998, 5, 525-527). 

The more modem techniques include, but are not limited to, non-informational mutagenesis 
techniques (referred to generically as "directed evolution"). Directed evolution, in conjunction with 
high-throughput screening, allows testing of statistically meaningful variations in protein 
conformation (Amold, F.H. Nature Biotechnol. 1998, 16, 617-618). Directed evolution technology 
can include diversification methods similar to that described by Crameri A. et al. (1998, Nahire 391 : 
288-291), site-saturation mutagenesis, staggered extension process (StEP) (Zhao, H.; Giver, L.; 
Shao, Z.; Affliolter, J.A.; Amold, F.H. Nature Biotechnol. 1998, 16, 258-262), and DNA 
synthesis/reassembly (U.S. Patent 5,965,408). 

Fragments of the Avilll polypeptide can be used, for example, to generate specific anti-Avim 
antibodies. Using known selection techniques, specific epitopes can be selected and used to 
generate monoclonal or polyclonal antibodies. Such antibodies have uthlty in the assay of AviHI 
activity as well as in purifying recombinant Avim polypeptides fixjm genetically engmeered host 
cells. 

Avini Polynucleotides: 

The invention also provides polynucleotide molecules encoding the AvilH polypeptides discussed 
above. Avim polynucleotide molecules of the invention include polynucleotide molecules 
having the nucleic acid sequence shown in Table 2 and SEQ ID NO: 2, polynucleotide molecules 
that hybridize to the nucleic acid sequence of Table 2 and SEQ ID N0:2 under high stiingency 
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hybridization conditions (for example, 42°, 2.5 hr., 6X SCC, 0.1%SDS); and polynucleotide 
molecules having substantial nucleic acid sequence identity with the nucleic acid sequence of 
Table 2 and SEQ ID N0:2, particularly with those nucleic acids encoding the catalytic domain, 
GH74 (from about amino acid A37 to about G776), the carbohydrate binding domain m (from 
about amino acid V859 to about at least Q946). 



Table 2. Avail nucleotide sequence. (SEQ ID NO: 2) 



ATGGATCGTTCGGAGAACATCCGTCTGACTATGAGATCACGACGATTGGTATCACTGCTCGCCGCCACTGCGTCGT 

TCGCCGTGGCCGCCGCTCTGGGAGTTCTGCCCATCGCGATAACGGCTTCTCCTGCGCACGCGGCGACGACTCAGCC 

GTACACCTGGAGCAACGTGGCGATCGGGGGCGGCGGCTTTGTCGACGGGATCGTCTTCAATGAAGGTGCACCGGGA 

ATTCTGTACGTGCGGACGGACATCGGGGGGATGTATCGATGGGATGCCGCCAACGGGCGGTGGATCCCTCTTCTGG 

ATTGGGTGGGATGGAACAATTGGGGGTACAACGGCGTCGTCAGCATTGCGGCAGACCCGATCAATACTAACAAGGT 

ATGGGCCGCCGTCGGAATGTACACCAACAGCTGGGACCCAAACGACGGAGCGATTCTCCGCTCGTCTGATCAGGGC 

GCAACGTGGCAAATAACGCCCCTGCCGTTCAAGCTTGGCGGCAACATGCCCGGGCGTGGAATGGGCGAGCGGCTTG 

CGGTGGATCCAAACAATGACAACATTCTGTATTTCGGCGCCCCGAGCGGCAAAGGGCTCTGGAGAAGCACAGATTC 

CGGCGCGACCTGGTCCCAGATGACGAACTTTCCGGACGTAGGCACGTACATTGCAAATCCCACTGACACGACCGGC 

TATCAGAGCGATATTCAAGGCGTCGTCTGGGTCGCTTTCGACAAGTCTTCGTCATCGCTCGGGCAAGCGAGTAAGA 

CCATTTTTGTGGGCGTGGCGGATCCCAATAATCCGGTCTTCTGGAGCAGAGACGGCGGCGCGACGTGGCAGGCGGT 

GCCGGGTGCGCCGACCGGCTTCATCCCGCACAAGGGCGTCTTTGACCCGGTCAACCACGTGCTCTATATTGCCACC 

AGCAATACGGGTGGTCCGTATGACGGGAGCTCCGGCGACGTCTGGAAATTCTCGGTGACCTCCGGGACATGGACGC 

GAATCAGCCCGGTACCTTCGACGGACACGGCCAACGACTACTTTGGTTACAGCGGCCTCACTATCGACCGCCAGCA 

CCCGAACACGATAATGGTGGCAACCCAGATATCGTGGTGGCCGGACACCATAATCTTTCGGAGCACCGACGGCGGT 

GCGACGTGGACGCGGATCTGGGATTGGACGAGTTATCCCAATCGAAGCTTGCGATATGTGCTTGACATTTCGGCGG 

AGCCTTGGCTGACCTTCGGCGTACAGCCGAATCCTCCCGTACCCAGTCCGAAGCTCGGCTGGATGGATGAAGCGAT 

GGCAATCGATCCGTTCAACTCTGATCGGATGCTCTACGGAACAGGCGCGACGTTGTACGCAACAAATGATCTCACG 

AAGTGGGACTCCGGCGGCCAGATTCATATCGCGCCGATGGTCAAAGGATTGGAGGAGACGGCGGTAAACGATCTCA 

TCAGCCCGCCGTCTGGCGCCCCGCTCATCAGCGCTCTCGGAGACCTCGGCGGCTTCACCCACGCCGACGTTACTGC 

CGTGCCATCGACGATCTTCACGTCACCGGTGTTCACGACCGGCACCAGCGTCGACTATGCGGAATTGAATCCGTCG 

ATCATCGTTCGCGCTGGAAGTTTCGATCCATCGAGCCAACCGAACGACAGGCACGTCGCGTTCTCGACAGACGGCG 

GCAAGAACTGGTTCCAAGGCAGCGAACCTGGCGGGGTGACGACGGGCGGCACCGTCGCCGCATCGGCCGACGGCTC 

TCGTTTCGTCTGGGCTCCCGGCGATCCCGGTCAGCCTGTGGTGTACGCAGTCGGATTTGGCAACTCCTGGGCTGCT 

TCGCAAGGTGTTCCCGCCAATGCCCAGATCCGCTCAGACCGGGTGAATCCAAAGACTTTCTATGCCCTATCCAATG 

GAACCTTCTATCGAAGCACGGACGGCGGCGTGACATTCCAACCGGTCGCGGCCGGTCTTCCGAGCAGCGGTGCCGT 

CGGTGTCATGTTCCACGCGGTGCCTGGAAAAGAAGGCGATCTGTGGCTCGCTGCATCGAGCGGGCTTTACCACTCA 

ACCAATGGCGGCAGCAGTTGGTCTGCAATCACCGGCGTATCCTCCGCGGTGAACGTGGGATTTGGTAAGTCTGCGC 

CCGGGTCGTCATACCCAGCCGTCTTTGTCGTCGGCACGATCGGAGGCGTTACGGGGGCGTACCGCTCCGACGACTG 

TGGGACGACCTGGGTACTGATCAATGATGACCAGCACCAATACGGAAATTGGGGACAAGCAATCACCGGTGACCAC 

GCGAATTTACGGCGGGTGTACATAGGCACGAACGGCCGTGGAATTGTATACGGGGACATTGGTGGTGCGCCGTCCG 

GATCGCCGTCTCCGTCGGTGAGTCCGTCGGCTTCGCCGAGCCTGAGCCCGAGCCCGAGCCCGAGCAGCTCGCCATC 

GCCGTCGCCGTCGCCGAGCTCGAGTCCATCCTCGTCGCCGTCTCCGTCGCCGTCACCATCGCCGAGTCCGTCTCGG 

TCTCCGTCACCATCGGCGTCGCCGAGCCCGTCTTCGTCACCGAGCCCGTCTTCGTCACCGTCTTCGTCGCCGAGCC 

CAACGCCGTCGTCGTCGCCGGTGTCGGGTGGGGTGAAGGTGCAGTATAAGAATAATGATTCGGCGCCGGGTGATAA 

TCAGATCAAGCCGGGTTTGCAGGTGGTGAATACCGGGTCGTCGTCGGTGGATTTGTCGACGGTGACGGTGCGGTAC 

TGGTTCACCCGGGATGGTGGCTCGTCGACACTGGTGTACAACTGTGACTGGGCGGCGATCGGGTGTGGGAATATCC 

GCGCCTCGTTCGGCTCGGTGAACCCGGCGACGCCGACGGCGGACACCTACCTGCAGN* 



The Avim polynucleotide molecules of the invention are preferably isolated molecules encoding the 
Avim polypetide having an amino acid sequence as shown in Table 1 and SEQ ID N0:1, as well as 
derivatives, variants, and useftil fragments of the Avim polynucleotide. The AviEI polynucleotide 
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sequence can include deletions, substitutions, or additions to the nucleic acid sequence of Table 2 
andSEQIDNO: 1. 

The Avim polynucleotide molecule of the invention can be cDNA, chemically synthesized DNA, 
5 DNA amplified by PCR, RNA, or combinations thereof Due to the degeneracy of the genetic 
code, two DNA sequences may differ and yet encode identical ammo acid sequences. The present 
invention thus provides an isolated polynucleotide molecule having a Avim nucleic acid sequence 
encoding Avim polypeptide, where the nucleic acid sequenc encodes a polypeptide having the 
complete amino acid sequences as shown in Table 1 and SEQ ID NO: 1 , or variants, derivatives, 
10 and fragments thereof 

The Avim polynucleotides of the invention have a nucleic acid sequence that is at least about 60% 
identical to the nucleic acid sequence shown in Table 2 and SEQ ID NO: 2, m some embodiments 
at least about 70% identical to the nucleic acid sequence shown in Table 2 and SEQ ID NO: 2, and 
15 in other embodiments at least about 90% identical to the nucleic acid sequence shown in Table 2 
and SEQ ID NO: 2. Nucleic acid sequence identity is determined by known methods, for example 
by aUgning two sequences in a software program such as the BLAST program (Altschul, S.F et al. 
(1990) J. Mol. Biol. 215:403-410, from the National Center for Biotechnology Information 
(http://www.ncbi.nhn.nih.gov/BLAST/). 

20 

The Avim polynucleotide molecules of the invention also uiclude isolated polynucleotide 
molecules having a nucleic acid sequence that hybridizes under high stringency conditions (as 
defined above) to a the nucleic acid sequence shown in Table 2 and SEQ ID NO: 2. Hybridization 
of the polynucleotide is to about 15 contiguous nucleotides, or about 20 contiguous nucleotides, and 
25 in other embodiments about 30 contiguous nucleotides, and in still other embodiments about 100 
contiguous nucleotides of the nucleic acid sequence shown hi Table 2 and SEQ ID NO: 2. 

Usefiil fragments of the Avim-encoding polynucleotide molecules described herein, include probes 
and primers. Such probes and primers can be used, for example, m PCR methods to amphfy and 
30 detect the presence of Avim polynucleotides in vitro, as well as in Southem and Northern blots for 
analysis of Avim. Cells expressing the Avim polynucleotide molecules of the mvention can also 
be identified by the use of such probes. Methods for the production and use of such primers and 
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probes are known. For PGR, 5' and 3' primers corresponding to a region at the termini of the Avim 
polynucleotide molecule can be employed to isolate and amplify the AvilH polynucleotide using 
conventional techniques. 

5 Other useful fragments of the Avim polynucleotides include antisense or sense ohgonucleotides 
comprising a single-stranded nucleic acid sequence capable of binding to a target AvilE mRNA 
(using a sense strand), or DNA (using an antisense strand) sequence. 

Vectors and Host Cells: 

10 The present invention also provides vectors containing the polynucleotide molecules of the 
invention, as well as host cells transformed with such vectors. Any of the polynucleotide molecules 
O invention may be contained in a vector, which generally includes a selectable marker and an 

origin of replication, for propagation in a host. The vectors fiirther include suitable transcriptional 
or translational regulatory sequences, such as those derived from a mammalian, microbial, viral, or 
iy 15 insect genes, operably linked to the Avim polynucleotide molecule. Examples of such regulatory 
f ft sequences include transcriptional promoters, operators, or enhancers, mRNA ribosomal binding 

sites, and appropriate sequences which control transcription and franslation. Nucleotide sequences 
y are operably hnked when the regulatory sequence functionally relates to the DNA encoding the 
f J target protein. Thus, a promoter nucleotide sequence is operably linked to a Avim DNA sequence 
^ 20 if the promoter nucleotide sequence directs the transcription of the Avim sequence. 

Selection of suitable vectors for the cloning of Avim polynucleotide molecules encoding the target 
Avim polypeptides of this invention will depend upon the host cell in which the vector will be 
fransformed, and, where appHcable, the host cell from which the target polypeptide is to be 
25 expressed. Suitable host cells for expression of Avim polypeptides include prokaryotes, yeast, and 
higher eukatyotic cells, each of which is discussed below. 

The Avim polypeptides to be expressed in such host cells may also be fiision proteins that include 
regions from heterologous proteins. As discussed above, such regions may be included to allow, for 
30 example, secretion, improved stability, or facilitated purification of the Avim polypeptide. For 
example, a nucleic acid sequence encoding an appropriate signal peptide can be incorporated into an 
expression vector. A nucleic acid sequence encoding a signal peptide (secretory leader) may be 
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fused in-frame to the Avim sequence so that Avim is translated as a fusion protein comprising the 
signal peptide. A signal peptide that is functional in the intended host cell promotes extracellular 
secretion of the Avim polypeptide. Preferably, the signal sequence will be cleaved from the Avim 
polypeptide upon secretion of Avim from the cell. Non-limiting examples of signal sequences that 
5 can be used in practicing the invention include the yeast I-factor and the honeybee melatin leader in 
Sf9 insect cells. 

Suitable host cells for expression of target polypeptides of the invention include prokaryotes, yeast, 
and higher eukaryotic cells. Suitable prokaryotic hosts to be used for the expression of these 
10 polypeptides include bacteria of the genera Escherichia, Bacillus, and Salmonella, as well as 
members of the genera Pseudomonas, Streptomyces, and Staphylococcus, For expression in 
q: prokaryotic cells, for example, in E, coli, the polynucleotide molecule encoding Avim polypeptide 
% preferably inchides an N-terminal methionine residue to facilitate expression of the recombinant 
r^^ polypeptide. The N-terminal Met may optionally be cleaved from the expressed polypeptide. 
W 15 

if. Expression vectors for use in prokaryotic hosts generally comprise one or more phenotypic 
% selectable marker genes. Such genes encode, for example, a protein that confers antibiotic 
H resistance or that suppUes an auxotrophic requirement. A wide variety of such vectors are readily 

ft it-i 

fig available from commercial sources. Examples inckide pSPORT vectors, pGEM vectors (Promega, 
y 20 Madison, WI), pPROEX vectors (LTI, Bethesda, MD), Bhxescript vectors (Stratagene), and pQE 
vectors (Qiagen). 

Avim can also be expressed in yeast host cells from genera including Saccharomyces, Pichia, and 
Kluveromyces. Preferred yeast hosts are S. cerevisiae and P, pastoris. Yeast vectors will often 

25 contain an origin of replication sequence from a 2T yeast plasmid, an autonomously rephcating 
sequence (ARS), a promoter region, sequences for polyadenylation, sequences for transcription 
termination, and a selectable marker gene. Vectors replicable in both yeast and E. coli (termed 
shuttle vectors) may also be used. In addition to the above-mentioned features of yeast vectors, a 
shuttle vector will also include sequences for repUcation and selection in E. coli. Direct secretion of 

30 the target polypeptides expressed in yeast hosts may be accomplished by the inclusion of nucleotide 
sequence encoding the yeast I-factor leader sequence at the 5* end of the Avim-encoding nucleotide 
sequence. 
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Insect host cell culture systems can also be used for the expression of Avim polypeptides. The 
target polypeptides of the invention are preferably expressed using a baculovirus expression system, 
as described, for example, in the review by Luckow and Summers, 1988 Bio/Technology 6:47. 

The choice of a suitable expression vector for expression of AviTTT polypeptides of the invention 
will depend upon the host cell to be used. Examples of suitable expression vectors for E. coli 
include pET, pUC, and similar vectors as is known in the art. Preferred vectors for expression of 
the Avim polypeptides include the shuttle plasmid pU702 for Streptomyces lividans, 
pGAPZalpha-A, B, C and pPICZalpha-A, B, C (Invitrogen) for Pichia pastoris, and pFE-1 and pFE-2 
for filamentous fungi and similar vectors as is known in the art. 

Modification of a Avim polynucleotide molecule to facilitate insertion into a particular vector 
(for example, by modifiying restriction sites), ease of use in a particular expression system or host 
(for example, using preferred host codons), and the like, are known and are contemplated for use 
in the invention. Genetic engineering methods for the production of Avim polypeptides include 
the expression of tiie polynucleotide molecules in cell fi:ee expression systems, in cellular hosts, 
in tissues, and in animal models, according to known methods. 

Compositions 

The invention provides compositions containing a substantially purified Avim polypeptide of the 
invention and an acceptable carrier. Such compositions are administered to biomass, for 
example, to degrade the cellulose in the biomass into simpler carbohydrate units and ultimately, 
to sugars. These released sugars fi-om the cellulose are converted into ethanol by any number of 
different catalysts. Such compositions may also be included in detergents for removal, for 
example, of cellulose containing stains within fabrics, or compositions used in the pulp and paper 
industry, to address conditions associated with cellulose content. Compositions of the present 
invention can be used in stonewashing jeans such as is well known in the art. Compositions can 
be used in the biopolishing of cellulosic fabrics, such as cotton, linen, rayon and Lyocell. 

The invention provides pharmaceutical compositions containing a substantially purified Avim 
polypeptide of the invention and if necessary a pharmaceutically acceptable carrier. Such 
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pharaiaceutical compositions are administered to cells, tissues, or patients, for example, to aid in 
delivery or targeting of other pharmaceutical compositions. For example, Avim polypeptides 
may be used where carbohydrate-mediated liposomal interactions are involved with target cells. 
Vyas SP et al. (2001), J. Pharmacy & Pharmaceutical Sciences May-Aug 4(2): 138-58. 

5 

The invention also provides reagents, compositions, and methods that are useful for analysis of 
Avim activity and for the analysis of cellulose breakdown. 

Compositions of the present invention may also include other known cellulases, and preferably, 
10 other known thermal tolerant cellulases for enhanced treatment of cellulose. 

Antibodies 

vy The polypeptides of the present invention, in whole or in part, may be used to raise polyclonal and 
U monoclonal antibodies that are useful in purifying AviUI, or detecting Avim polypeptide 
|,j 15 expression, as well as a reagent tool for characterizing the molecular actions of the Avim 
^ polypeptide. Preferably, a peptide containing a imique epitope of the Avim polypeptide is used in 
preparation of antibodies, using conventional techniques. Methods for the selection of peptide 
epitopes and production of antibodies are known. See, for example, Antibodies: A Laboratory 
• = Manual, Harlow and Land (eds.), 1988 Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 
p 20 N.Y.; Monoclonal Antibodies, Hybridomas: A New Dimension in Biological Analyses, Kennet et 
al (eds.), 1 980 Plenum Press, New York. 

Assays 

Agents that modify, for example, increase or decrease, Avim hydrolysis or degradation of 
25 cellulose can be identified, for example, by assay of Avim cellulase activity and/or analysis of 
Avim binding to a cellulose substrate. Incubation of cellulose in the presence of Avim and in the 
presence or absence of a test agent and correlation of cellulase activity or carbohydrate binding 
permits screening of such agents. For example, cellulase activity and binding assays may be 
performed in a manner similar to those described in hwin et al, J. Bacteriology 180(7): 1709- 
30 1714 (April 1998). 
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The Avim stimulated activity is determined in the presence and absence of a test agent and then 
compared. A lower Avim activated test activity in the presence of the test agent, than in the 
absence of the test agent, indicates that the test agent has decreased the activity of the Avim. A 
higher Avim activated test activity in the presence of the test agent than in the absence of the test 
agent indicates that the test agent has increased the activity of the Avim. Stimulators and 
inhibitors of Avim may be used to augment, inhibit, or modify Avim mediated activity, and 
therefore may have potential industrial uses as well as potential use in the further elucidation of 
Avim's molecular actions. 

Therapeutic Applications 

The Avim polypeptides of the invention are effective in adding in delivery or targeting of other 
pharmaceutical compositions within a host. For example, Avim polypeptides may be used where 
carbohydrate-mediated liposomal interactions are involved with target cells. Vyas SP et al. 
(2001), /. Pharm Pharm Sci May-Aug 4(2): 138-58. 

Aviffl polynucleotides and polypeptides, including vectors expressing Avim, of the invention can 
be formulated as pharmaceutical compositions and administered to a host, preferably mammaUan 
host, including a human patient, in a variety of forms adapted to the chosen route of 
administration. The compounds are preferably administered in combination with a 
pharmaceutically acceptable carrier, and may be combined with or conjugated to specific delivery 
agents, including targeting antibodies and/or cytokines. 

Avim can be administered by known techniques, such as orally, parentally (including 
subcutaneous injection, intravenous, intramuscular, intrastemal or infusion techniques), by 
inhalation spray, topically, by absorption through a mucous membrane, or rectally, in dosage unit 
formulations containing conventional non-toxic pharmaceutically acceptable carriers, adjuvants 
or vehicles. Pharmaceutical compositions of the invention can be in the form of suspensions or 
tablets suitable for oral administration, nasal sprays, creams, sterile injectable preparations, such 
as sterile injectable aqueous or oleagenous suspensions or suppositories. 

For oral administration as a suspension, the compositions can be prepared according to 
techniques well-known in the art of pharmaceutical formulation. The compositions can contain 
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microcrystalline cellulose for imparting bulk, alginic acid or sodium alginate as a suspending 
agent, methylcellulose as a viscosity enhancer, and sweeteners or flavoring agents. As immediate 
release tablets, the compositions can contain microcrystalline cellulose, starch, magnesium 
stearate and lactose or other excipients, binders, extenders, disintegrants, diluents and lubricants 
5 known in the art. 

For administration by inhalation or aerosol, the compositions can be prepared according to 
techniques well-known in the art of pharmaceutical formulation. The compositions can be 
prepared as solutions in sahne, using benzyl alcohol or other suitable preservatives, absorption 
10 promoters to enhance bioavailability, fluorocarbons or other solubihzing or dispersing agents 
known in the art. 

^ For administration as injectable solutions or suspensions, the compositions can be formulated 
according to techniques well-known in the art, using suitable dispersing or wetting and 
|y 15 suspending agents, such as sterile oils, inchiding synthetic mono- or diglycerides, and fatty acids, 
^ including oleic acid. 

H For rectal administt-ation as suppositories, the compositions can be prepared by mixing with a 
suitable non-irritating excipient, such as cocoa butter, synthetic glyceride esters or polyethylene 

^ 20 glycols, which are solid at ambient temperatures, but liquefy or dissolve in the rectal cavity to 
release the drug. 

Preferred administration routes include orally, parenterally, as well as intravenous, intt-amuscular 
or subcutaneous routes. More preferably, the compounds of the present invention are 
25 administered parenterally, i.e., intravenously or intraperitoneally, by infusion or injection. 

Solutions or suspensions of the compounds can be prepared in water, isotonic saline (PBS) and 
optionally mixed with a nontoxic surfactant. Dispersions may also be prepared in glycerol, liquid 
polyethylene, glycols, DNA, vegetable oils, triacetin and mixtures thereof Under ordinary 
30 conditions of storage and use, these preparations may contain a preservative to prevent the growth 
of microorganisms. 
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The pharmaceutical dosage form suitable for injection or infusion use can include sterile, aqueous 
solutions or dispersions or sterile powders comprising an active ingredient which are adapted for 
the extemporaneous preparation of sterile injectable or infiisible solutions or dispersions. In all 
cases, the ultimate dosage form should be sterile, fluid and stable under the conditions of 
5 manufacture and storage. The liquid carrier or vehicle can be a solvent or Uquid dispersion 
medium comprising, for example, water, ethanol, a polyol such as glycerol, propylene glycol, or 
liquid polyethylene glycols and the like, vegetable oils, nontoxic glyceryl esters, and suitable 
mixtures thereof The proper fluidity can be maintained, for example, by the formation of 
liposomes, by the maintenance of the required particle size, in the case of dispersion, or by the 
10 use of nontoxic surfactants. The prevention of the action of microorganisms can be accomplished 
by various antibacterial and antifimgal agents, for example, parabens, chlorobutanol, phenol, 

p sorbic acid, thimerosal, and the like, hi many cases, it will be desirable to include isotonic 
agents, for example, sugars, buffers, or sodium chloride. Prolonged absorption of the injectable 

H compositions can be brought about by the inclusion in the composition of agents delaying 

iy 15 absorption~for example, aluminxmi monosterate hydrogels and gelatin. 

~ Sterile injectable solutions are prepared by incorporating the compounds in the required amount 
%4 in the appropriate solvent with various other ingredients as enimierated above and, as required, 
Jl followed by fiher steriUzation. In the case of sterile powders for the preparation of sterile 
M 20 mjectable solutions, the preferred methods of preparation are vacuum drying and jfreeze-drying 

techniques, which yield a powder of the active ingredient plus any additional desired ingredient 

present in the previously sterile-filtered solutions. 

Industrial Applications 

25 The Avim polypeptides of the invention are effective cellulases. Li the methods of the invention, 
the cellulose degrading effects of Avim are achieved by treating biomass at a ratio of about 1 to 
about 50, or about 1:40, 1:35, 1:30, 1:25 , 1:20 or even about 1: 70 in some preparations of the 
AVnn of Aviin:biomass. Avim may be used under extreme conditions, for example, elevated 
temperatures and acidic pH. Treated biomass is degraded into simpler forms of carbohydrates, 

30 and in some cases glucose, which is then used in the formation of ethanol or other industrial 
chemicals, as is known in the art. Other methods are envisioned to be within the scope of the 
present invention, including methods for treating fabrics to remove cellulose-containing stains 
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and other methods already discussed. Avim polypeptides can be used in any known application 
currently utilizing a cellulase, all of which are within the scope of the present invention. 

Having generally described the invention, the same will be more readily understood by reference 
5 to the following examples, which are provided by way of illustration and are not intended as 
limiting. 

EXAMPLES 

10 Example 1: Molecular Cloning of Avail 

Genomic DNA was isolated from Acidothermus cellulolyticus and purified by banding on cesium 
12 chloride gradients. Genomic DNA was partially digested with Sau 3 A and separated on agarose 
% gels. DNA fragments in the range of 9-20 kilobase pairs were isolated from the gels. This 
}■■* purified Sau 3A digested genomic DNA was ligated into the Bam HI acceptor site of purified 
U 15 EMBL3 lambda phage arms (Clontech, San Diego, Calif). Phage DNA was packaged according 
fp to the manufacturer's specification and plated with E. Coli LE392 in top agar which contained the 

soluble cellulose analog, carboxymethylcellulose (CMC). The plates were incubated overnight 
"li (12-24 hours) to allow fransfection, bacterial growth, and plaque formation. Plates were stained 

with Congo Red followed by destaining with 1 M NaCl. Lambda plaques harboring 
W 20 endoglucanase clones showed up as unstained plaques on a red background. 

Lambda clones which screened positive on CMC-Congo Red plates were purified by successive 
rounds of picking, plating and screening. Individual phage isolates were named SL-1, SL-2, SL- 
3, and SL-4. Subsequent subcloning efforts employed the SL-3 clone which contained an 
25 approximately 14.2 kilobase fragment of Acidothermus cellulolyticus genomic DNA. 

Template DNA was constructed using a 9 kilobase Bam HI fragment obtained from the 14.2 
kilobase lambda clone SL-3 prepared from Acidothermus cellulolyticus genomic DNA. The 9 
kilobase Bam HI fragment from SL-3 was subcloned into pDR540 to generate a plasmid 
30 NREL50 1 . NREL501 was sequenced by the primer walking method as is known in the art. 
NREL501 was then subcloned into pUC19 using restriction enzymes Pst I and Eco RI and 
fransformed into E. coli XLl-blue (Stratagene) for the production of template DNA for 
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sequencing. Each subclone was sequenced from both the forward and reverse directions. DNA 
for sequencing was prepared from an overnight growth in 500 mL LB broth using a megaprep 
DNA purification kit from Promega. The templated DNA was PEG precipitated and suspended 
in de-ionized water and adjusted to a final concentration of 0.25 miUigrams/mL. 

5 

Custom primers were designed by reading upstream known sequence and selecting segments of 
an appropriate length to fimction, as is well known in the art. Primers for cycle sequencing were 
synthesized at the Macromolecular Resources Facihty located at Colorado State University in 
Fort Colhns , Colorado. Typically the sequencing primers were 26 to 30 nucleotides in length, 
10 but were sometimes longer or shorter to accommodate a melting temperature appropriate for 
cycle sequencing. The sequencing primers were diluted in de-ionized water, the concentration 
measured using UV absorbance at 260 nm, and then adjusted to a final concentration of 5 
^[^ pmol/microL. 

[jj 15 Templates and sequencing primers were shipped to the Iowa State University DNA Sequencing 
^ Facility at Ames, Iowa for sequencing using standard chemistries for cycle sequencing. In some 

cases, regions of the template that sequenced poorly using the standard protocols and dye 
S! terminators were repeated with the addition of 2 microL DMSO and by using nucleotides 
f j optimized for the sequencing of high GC content DNA. An inverse PGR technique known in the 
J^^ 20 art was applied to continue sequencing the genomic DNA, and a primer walking method was 

used to sequence the large PGR products. Each PGR fi-agment was sequenced fi-om both strands, 

using high fidelity commercial DNA polymerase. 

Sequencing data from primer walking and subclones were assembled together to verify that all 
25 SL-3 regions had been sequenced fi-om both strands. An open reading fi-ame (ORF) was found in 
the 9 kilobase Bam HI fi-agment, G-terminal of El (U.S. Patent 5,536,655), termed Avim. An 
ORF of 3366 bp [SEQ ID N0:2] and deduced amino acid sequence [SEQ ID N0:1] are shown in 
Tables 1 and 2. The amino acid sequence predicted by SEQ ID N0:1 was determined to have 
significant homology to known cellulases, as is shown below in Example 2 and Table 3. 



30 
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The amino acid sequence represents a novel member of the family of proteins with cellulase 
activity. Due to the source of isolation, from the thermophiHc Acidothermus cellulolyticus, Avim 
is a novel member of cellulases with properties including thermal tolerance. It is also known that 
thermal tolerant enzymes may have other properties (see definition above). 



Example 2: Avilll includes a GH74 catalytic domain 

Sequence alignments and comparisons of the amino acid sequences of the Acidothermus 
cellulolyticus Avim catalytic domain (approximately amino acids 37 to 776) and Aspergillus 
aculeatus Avicelase IH (endoglucanase) polypeptides were prepared, using the ClustalW program 
(Thompson J.D et al. (1994), Nucleic Acids Res. 22:4673-4680 from EMBL European 
Bioinformatics histimte website (http://www.ebi.ac.uk/)). An examination of the amino acid 
sequence alignment of the GH74 domain indicates that the amino acid sequence of Avim 
catalytic domain is homologous to the amino acid sequence of a known GH74 family catalytic 
domains fox Aspergillus aculeatus Avicelase m (endoglucanase) (see Table 3). In Table 3, the 
notations are as follows: an asterisk "*" indicates identical or conserved residues in all sequences 
in the alignment; a colon ":" indicates conserved substitutions; a period "." indicates semi- 
conserved substitutions; and a hyphen "-" indicates a gap in the sequence. The amino acid 
sequence predicted for the Avim GH74 domain is approximately 46 % identical to the 
Aspergillus aculeatus Avicelase m (endoglucanase) GH74 domain, indicating that the Avim 
catalytic domain is a member of the GH74 family (Henrissat et al., (1991) supra). 
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Table 3. Multiple amino acid sequence alignment of a Avilll catalytic domain and 
polypeptides with Glycoside Hydrolase Family 74 catalytic domains. 

Multi alignment of related Glycoside Hydrolase Family 74 catalytic domain 

GH74_Ace: Acidothermus cellulolyticus Avilll catalytic domain GH74 

Avim_Aac: Aspergillus aculeatus Avicelase ID (endoglucanase), GeneBank Acc. # BAA29031 



GH74_Ace ATTQPYTWSNVAIGGGG-FVDGIVFNEGAPGILYVRTDIGGMYRWDAANGRWIPLLDWVG 
AvilII_Aac AASQAYTWKNWTGGGGGFTPGIVFNPSAKGVAYARTDIGGAYRLNSDD-TWTPLMDWVG 



GH74_Ace 
Avilll Aac 



GH74__Ace 
Avilll Aac 



GH74_Ace 
Avilll Aac 



GH74_Ace 
Avilll Aac 



*:;*^***^**^ **** * *. * ****** ** 



WNNWGYNGWSIAADPINTNKVWAAVG^T!fTNSWDPNDGAILRSSDQGATWQITPLPFKLG 
NDTWHDWGIDALATD PVDTDRVYVAVGMYTNEWD PNVGSI LRSTDQGDTWTETKLPFKVG 
:.* *: ::*:**::*::*:.*******.**** *;****.*** ** * ****.* 



GH74_Ace GNMPGRGMGERLAVDPNNDNILYFGAPSGKGLWRSTDSGATWSQMTNFPDVGTYIANPTD 
Avil I I_Aac GNMPGRGMGERLAVDPNKNS ILYFGARSGHGLWKSTDYGATWSNVTSFTWTGTYFQDSSS 

*****************..****** **.***.*** *****..* *^ .***: 

TTGYQSDIQGWWVAFDKSSSSLGQASKTIFVGVADPNNPVFWSRDGGATWQAVPGAP-T 
T - - YTSDP VGIAWVTFDSTSGS SGS ATPRI FVGVADAGKSVFKS EDAGATWAWVSGEPQY 
* * ** *:.**:**.:*.* *.*; ******* . ** * * **** * * * 



GH74_Ace GFIPHKGVFDPVNHVLYIATSNTGGPYDGSSGDVWKFSVTSGTWTRISPVPSTDTANDYF 
Avil II_Aac GFLPHKGVLSPEEKTLYISYANGAGPYDGTNGTVHKYNITSGVWTDISP- - -TSLASTYY 

**:*****:.* ::.***: ;* ,**★**. ^* * *. .*** ** *** *^ 

GH74_Ace GYSGLTIDRQHPNTIMVATQISWWPDTIIFRSTDGGATWTRIWDWTSYPNRSLRYVLDIS 
AvilII_Aac GYGGLSVDLQVPGTLMVAALNCWWPDELIFRSTDSGATWSPIWEWNGYPSINYYYSYDIS 
**^**.j* * *_*.***. **** .**********. **.*^^**^ ^ * *** 

GH74_Ace AEPWLTFGVQPNPPVPSPKLGWMDEAMAIDPFKSDRMLYGTGATLYATNDLTKWDSGGQI 
AvilII_Aac NAPWIQDTTSTDQFP--VRVGWlWEALAIDPFDSmWLYGTGLTVYGGHDLTNWDSKHNV 
**: ::*** **.*****.*.. ***** *.*^ .***.*** 

GH74_Ace HIAPMVKGLEETAVNDLISPPSGAPLISALGDLGGFTHADVTAVPSTIFTSPVFTTGTSV 
Avi 1 1 1_Aac TVKSLAVGI EEMAVLGL I TP PGGPALLS AVGDDGGF YHSDLDAAPNQAYHT PT YGTTNGI 

: .:. *:** ** ,**;**,*.,*.**.** *** *;*; *.*. ; * 

DYAELNPSIIVRAGSFDPSSQPNDRHVAFSTDGGKNWFQGSEPGGVTTGGTVAASADGSR 

DYAGNKPSNI VRSGASDDYP TLALSSWFGSTWYADYAASTSTGTGAVALSADGDT 

*** :** ***:*. * ^ :*:*:: *..*: . .. * *;** ****^ 

FVWAPGDPGQPWYAVGFGNSWAASQGVPANAQIRSDRVNPKTFYALSNGTFYRSTDGGV 

VLLMSSTSGALVSKSQG TLTAVSSLPSGAVIASDKSDNTVFYGGSAGAIYVSKNTAT 

.:...**:* ; :* * ** ? ■ ** * *•-* * . 



GH74_Ace TFQPVAAGLPSSGAVGVMFHAVPGKEGDLWLAASSGLYHSTNGGSSWSAI-TGVSSAVNV 
AviIII_Aac SFTKTVS-LGSSTTVNAIR-AHPSIAGDVWASTDKGLWHSTDYGSTFTQIGSGVTAGWSF 
:* ..: * ** :*.,: * *. **.* :;..**:***. **;:; * :**:;. 

GH74_Ace GFGKSAPGSSYPAVFWGTIGGVTGAYRSDDCGTTWVLINDDQHQYGN-WGQAITGDHAN 
AvilII_Aac GFGKASSTGSYWIYGFFTIDGAAGLFKSEDAGTNWQVISDASHGFGSGSANWNGDLQT 
****::. .** .:: . **^*_.* ..*.**** ;*_* ^* . . ** 

GH74_Ace LRRVYIGTNGRGIVYGDIGGAPSG 
Avi I I I_Aac YGRVFRGHERPGHLLRQSQREPAG 
** . * - * . . * . * 



Example 3: Mixed Domain GH74, CBD II, CBD III Genes and Hybrid Polypeptides 

From the putative locations of the domains in the Avilll cellulase sequence given above and in 
comparable cloned cellulase sequences jfrom other species, one can separate individual domains 
and combine them with one or more domains from different sequences. The significant similarity 
between cellulase genes permit one by recombinant techniques to arrange one or more domains 
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from the Acidothermus cellulolyticus Aviin cellulase gene with one or more domains from a 
cellulase gene from one or more other microorganisms. Other representative endoglucanase 
genes include Bacillus polymyxa beta-(l,4) endoglucanase (Baird et al, Journal of Bacteriology, 
172: 1576-86 (1992)) and Xanthomonas campestris beta-(l,4)-endoglucanase A (Gough et al, 
5 Gene 89:53-59 (1990)). The resuU of the fusion of any two or more domains will, upon 
expression, be a hybrid polypeptide. Such hybrid polypeptides can have one or more catalytic or 
binding domains. For ease of manipulation, recombinant techniques may be employed such as the 
addition of restriction enzyme sites by site-specific mutagenesis. If one is not using one domain 
of a particular gene, any number of any type of change including complete deletion may be made 
10 in the unused domain for convenience of manipulation. 

n ^* understood for purposes of this disclosure, that various changes and modifications may be 

^-^ made to the invention that are well within the scope of the invention. Numerous other changes 

H may be made which will readily suggest themselves to those skilled m the art and which are 

y 15 encompassed in the spfrit of the invention disclosed herein and as defined in the appended claims. 

=^ This specification contains numerous citations to references such as patents, patent applications, 
and publications. Each is hereby incorporated by reference for all purposes. 

fin 



